76 research outputs found

    Komunikazioaren teoriaren oinarriak

    Get PDF
    Liburu hau Bilboko Ingeniaritza Eskolan (Euskal Herriko Unibertsitatea UPV/EHU) Telekomunikazio Teknologiaren Ingeniaritzako Gradu titulazioan irakasten den Komunikazioaren Teoria irakasgaiaren klase magistraletan gidaliburua da. Edukiak, beraz, irakaskuntza plan berria ezarri zenetik, klase magistralak irakasten dituzten irakasleek diseinatutako edukiarekin bat datoz. Komunikazioaren Teoria irakasgaiak telekomunikazioen oinarrizko kontzeptuak lantzen ditu. Horrela, ikuspuntu formal eta matematikoa abiapuntu, telekomunikazio sistema modernoetan informazioa transmititzen dituzten oinarrizko mekanismoak deskribatzen ditu (irrati eta telebista digitala, datu-transmisioa, telefono bidezko komunikazioak, eta abar)

    Fundamentos de teoría de la comunicación

    Get PDF
    135 p.Este libro es una guía para las clases magistrales del curso de Teoría de la Comunicación impartido en la titulación de Grado de Tecnología de Ingeniería de Telecomunicación, en la Escuela de Ingeniería de Bilbao (Universidad del País Vasco/Euskal Herriko Unibertsitatea). El contenido, por tanto, es el diseñado por el profesorado responsable de las clases magistrales, desde la implantación del actual plan de estudios. La asignatura de Teoría de la Comunicación describe, desde un punto de vista formal y matemático, los mecanismos básicos que permiten realizar la transmisión de la información en los sistemas de telecomunicación modernos (radio y televisión digital, transmisión de datos, comunicaciones telefónicas etc.)

    Komunikazioaren teoriaren oinarriak

    Get PDF
    Liburu hau Bilboko Ingeniaritza Eskolan (Euskal Herriko Unibertsitatea UPV/EHU) Telekomunikazio Teknologiaren Ingeniaritzako Gradu titulazioan irakasten den Komunikazioaren Teoria irakasgaiaren klase magistraletan gidaliburua da. Edukiak, beraz, irakaskuntza plan berria ezarri zenetik, klase magistralak irakasten dituzten irakasleek diseinatutako edukiarekin bat datoz. Komunikazioaren Teoria irakasgaiak telekomunikazioen oinarrizko kontzeptuak lantzen ditu. Horrela, ikuspuntu formal eta matematikoa abiapuntu, telekomunikazio sistema modernoetan informazioa transmititzen dituzten oinarrizko mekanismoak deskribatzen ditu (irrati eta telebista digitala, datu-transmisioa, telefono bidezko komunikazioak, eta abar)

    Evaluation of Tacotron Based Synthesizers for Spanish and Basque

    Get PDF
    In this paper, we describe the implementation and evaluation of Text to Speech synthesizers based on neural networks for Spanish and Basque. Several voices were built, all of them using a limited number of data. The system applies Tacotron 2 to compute mel-spectrograms from the input sequence, followed by WaveGlow as neural vocoder to obtain the audio signals from the spectrograms. The limited number of data used for training the models leads to synthesis errors in some sentences. To automatically detect those errors, we developed a new method that is able to find the sentences that have lost the alignment during the inference process. To mitigate the problem, we implemented a guided attention providing the system with the explicit duration of the phonemes. The resulting system was evaluated to assess its robustness, quality and naturalness both with objective and subjective measures. The results reveal the capacity of the system to produce good quality and natural audios.This work was funded by the Basque Government (Project refs. PIBA 2018-035, IT-1355-19). This work is part of the project Grant PID 2019-108040RB-C21 funded by MCIN/AEI/10.13039/ 501100011033

    Modelo de duración para conversión de texto a voz en euskera

    Get PDF
    En este artículo se presenta el trabajo realizado en el modelado de la duración de los fonemas en euskera estándar, para ser utilizado en conversión de texto a voz. El modelado estadístico se ha llevado a cabo mediante árboles binarios de regresión utilizando un corpus de 57.300 fonemas. Se han realizado varios experimentos de predicción testeando diferentes factores de influencia. El resultado obtenido en la predicción de la duración tiene un RMSE de 22.23 ms.This paper presents the modelling of phone durations in standard Basque, to be included in a text-to-speech system. The statistical modelling has been done using binary regression trees and a large corpus containing 57.300 phones. Several experiments have been performed, testing different sets of predicting factors. The result when predicting durations with this model has a RMSE of 22.23 ms.Este trabajo ha sido parcialmente financiado por el Ministerio de Ciencia y Tecnología (TIC2000-1005-C03-03 y TIC2000-1669-C04-03)

    Enrichment of Oesophageal Speech: Voice Conversion with Duration-Matched Synthetic Speech as Target

    Get PDF
    Pathological speech such as Oesophageal Speech (OS) is difficult to understand due to the presence of undesired artefacts and lack of normal healthy speech characteristics. Modern speech technologies and machine learning enable us to transform pathological speech to improve intelligibility and quality. We have used a neural network based voice conversion method with the aim of improving the intelligibility and reducing the listening effort (LE) of four OS speakers of varying speaking proficiency. The novelty of this method is the use of synthetic speech matched in duration with the source OS as the target, instead of parallel aligned healthy speech. We evaluated the converted samples from this system using a collection of Automatic Speech Recognition systems (ASR), an objective intelligibility metric (STOI) and a subjective test. ASR evaluation shows that the proposed system had significantly better word recognition accuracy compared to unprocessed OS, and baseline systems which used aligned healthy speech as the target. There was an improvement of at least 15% on STOI scores indicating a higher intelligibility for the proposed system compared to unprocessed OS, and a higher target similarity in the proposed system compared to baseline systems. The subjective test reveals a significant preference for the proposed system compared to unprocessed OS for all OS speakers, except one who was the least proficient OS speaker in the data set.This project was supported by funding from the European Union’s H2020 research and innovation programme under the MSCA GA 675324 (the ENRICH network: www.enrich-etn.eu (accessed on 25 June 2021)), and the Basque Government (PIBA_2018_1_0035 and IT355-19)

    Intelligibility and Listening Effort of Spanish Oesophageal Speech

    Get PDF
    Communication is a huge challenge for oesophageal speakers, be it for interactions with fellow humans or with digital voice assistants. We aim to quantify these communication challenges (both human-human and human-machine interactions) by measuring intelligibility and Listening Effort (LE) of Oesophageal Speech (OS) in comparison to Healthy Laryngeal Speech (HS). We conducted two listening tests (one web-based, the other in laboratory settings) to collect these measurements. Participants performed a sentence recognition and LE rating task in each test. Intelligibility, calculated as Word Error Rate, showed significant correlation with self-reported LE ratings. Speaker type (healthy or oesophageal) had a major effect on intelligibility and effort. More LE was reported for OS compared to HS even when OS intelligibility was close to HS. Listeners familiar with OS reported less effort when listening to OS compared to nonfamiliar listeners. However, such advantage of familiarity was not observed for intelligibility. Automatic speech recognition scores were higher for OS compared to HS.This project was supported by funding from the EUs H2020 research and innovation programme under the MSCA GA 67532*4 (the ENRICH network: www.enrich-etn.eu), the Spanish Ministry of Economy and Competitiveness with FEDER support (RESTORE project, TEC2015-67163-C2-1-R) and the Basque Government (DL4NLP KK-2019/00045, PIBA_2018_1_0035 and IT355-19)

    Intelligibility and Listening Effort of Spanish Oesophageal Speech

    Get PDF
    Communication is a huge challenge for oesophageal speakers, be it for interactions with fellow humans or with digital voice assistants. We aim to quantify these communication challenges (both human-human and human-machine interactions) by measuring intelligibility and Listening Effort (LE) of Oesophageal Speech (OS) in comparison to Healthy Laryngeal Speech (HS). We conducted two listening tests (one web-based, the other in laboratory settings) to collect these measurements. Participants performed a sentence recognition and LE rating task in each test. Intelligibility, calculated as Word Error Rate, showed significant correlation with self-reported LE ratings. Speaker type (healthy or oesophageal) had a major effect on intelligibility and effort. More LE was reported for OS compared to HS even when OS intelligibility was close to HS. Listeners familiar with OS reported less effort when listening to OS compared to nonfamiliar listeners. However, such advantage of familiarity was not observed for intelligibility. Automatic speech recognition scores were higher for OS compared to HS.This project was supported by funding from the EUs H2020 research and innovation programme under the MSCA GA 67532*4 (the ENRICH network: www.enrich-etn.eu), the Spanish Ministry of Economy and Competitiveness with FEDER support (RESTORE project, TEC2015-67163-C2-1-R) and the Basque Government (DL4NLP KK-2019/00045, PIBA_2018_1_0035 and IT355-19)

    Frame-Based Phone Classification Using EMG Signals

    Get PDF
    This paper evaluates the impact of inter-speaker and inter-session variability on the development of a silent speech interface (SSI) based on electromyographic (EMG) signals from the facial muscles. The final goal of the SSI is to provide a communication tool for Spanish-speaking laryngectomees by generating audible speech from voiceless articulation. However, before moving on to such a complex task, a simpler phone classification task in different modalities regarding speaker and session dependency is performed for this study. These experiments consist of processing the recorded utterances into phone-labeled segments and predicting the phonetic labels using only features obtained from the EMG signals. We evaluate and compare the performance of each model considering the classification accuracy. Results show that the models are able to predict the phonetic label best when they are trained and tested using data from the same session. The accuracy drops drastically when the model is tested with data from a different session, although it improves when more data are added to the training data. Similarly, when the same model is tested on a session from a different speaker, the accuracy decreases. This suggests that using larger amounts of data could help to reduce the impact of inter-session variability, but more research is required to understand if this approach would suffice to account for inter-speaker variability as well.This research was funded by Agencia Estatal de Investigación grant number ref.PID2019-108040RB-C21/AEI/10.13039/50110001103

    Automatic Classification of Synthetic Voices for Voice Banking Using Objective Measures

    Get PDF
    Speech is the most common way of communication among humans. People who cannot communicate through speech due to partial of total loss of the voice can benefit from Alternative and Augmentative Communication devices and Text to Speech technology. One problem of using these technologies is that the included synthetic voices might be impersonal and badly adapted to the user in terms of age, accent or even gender. In this context, the use of synthetic voices from voice banking systems is an attractive alternative. New voices can be obtained applying adaptation techniques using recordings from people with healthy voice (donors) or from the user himself/herself before losing his/her own voice. In this way, the goal is to offer a wide voice catalog to potential users. However, as there is no control over the recording or the adaptation processes, some method to control the final quality of the voice is needed. We present the work developed to automatically select the best synthetic voices using a set of objective measures and a subjective Mean Opinion Score evaluation. A prediction algorithm of the MOS has been build which correlates similarly to the most correlated individual measure.This work has been funded by the Basque Government under the project ref. PIBA 2018-035 and IT-1355-19. This work is part of the project Grant PID 2019-108040RB-C21 funded by MCIN/AEI/10.13039/501100011033
    corecore